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1.  INTRODUCTION 

An  experimental  software  system  for  the  solution  of  a  class  of  non-linear, 
stationary  boundary-val ue  problems  is  currently  under  development  at  the  University 
of  Pittsburgh.  The  program  NFEARS  (Non-linear  Finite- Element  Adaptive  Research 
Solver)  is  a  further  development  of  the  program  FEARS  [1-3],  which  utilized  bilinear 
elements  to  solve  linear  elliptic  problems.  NFEARS  retains  the  functionality  of  the 
earlier  program,  but  incorporates  a  continuation  procedure  to  solve  non-linear 
problems,  using  biquadratic  Hermitian  elements.  The  NFEARS  design  properties  include 
the  following: 

(1)  The  system  constitutes  an  appl ications-independent  finite-element 
solver  for  a  certain  class  of  two-dimensional,  non-linear,  stationary, 

boundary-value  problems  defined  by  a  weak  mathematical  formulation. 

J 

(2)  Adaptive  approaches  are  employed  extensively.  The  a  posteriori 
error  indicators  developed  in  [2]  are  used  to  control  the  adaptive 
processes  and  to  provide  a  solution  with  near  optimal  error  within 
a  prescribed  cost  range. 
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(3)  In  the  system  design,  advantage  was  taken  of  the  inherent 
parallelism  and  modularity  of  the  finite  element  method.  In 
particular,  a  two-level  data  structure  has  been  employed  to 
take  maximum  advantage  of  the  parallelism  in  the  continuation 
process.  ''  ,  ; 

(4)  The  system  is  highly  modular  in  structure,  reflecting  not  only 
the  natural  separation  by  distinct  function,  but  also  the 
isolation  of  those  processes,  particularly  error  analysis,  which 
are  anticipated  to  be  of  the  greatest  experimental  interest.  In 
this  way,  the  migration  of  NFEARS  from  convenient  research  vehicle 
to  efficient  production  tool  will  be  gradual  and  controlled. 

Extensive  provisions  for  evaluating  the  performance  are  incorporated. 

A  principle  feature  of  (2)  is  an  adaptive  algorithm  for  mesh  refinrment  and/or 
derefinement  (or  simply  (de) refinement).  Briefly,  after  a  solution  has  been  obtained 
on  some  mesh,  error  indicators  are  computed  on  each  individual  element.  These 
indicators  are  used,  both  individually  and  in  patterns,  to  compose  an  estimate  of  the 
error  in  an  appropriate  norm,  and  to  specify  what,  if  any,  changes  are  to  be  made  to 
the  mesh.  The  (de) refinement  algorithm  in  essence  divides  elements  and/or  consoli¬ 
dates  others  so  as  to  achieve  a  more  equal  distribution  of  error  indicators. 

This  paper  presents  the  overall  structure  of  NFEARS.  The  weak  formulation  for 
the  admissible  class  of  problems  is  given  in  Section  2.  Sections  3  and  4  discuss 
the  design  criteria  of,  respectively,  the  control  and  data  structures  of  the  program. 


2.  MATHEMATICAL  BASIS  OF  THE  DESIGN 
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cations,  yet  narrow  enough  to  allow  for  easy  implementation  as  an  experimental 
vehicle.  Our  choice  was  a  fairly  general  class  of  non-linear,  stationary  boundary- 
value  problems  on  certain  two-dimensional  domains  which  admits  also  many  linear 
elliptic  problems  as  special  cases.  Throughout  the  design,  we  have  refrained  from 
using  approaches  which  would  limit  extension  to  more  general  problems. 

The  permissible  domains  are  of  the  same  type  as  in  FEARS  [1-3].  The  domain  n 
in  F  is  taken  to  be  an  open,  connected,  and  simply  connected  set  with  boundary 
:v,’.  We  denote  points  of  v.  by  x  :=  (xpX^)^.  The  system  will  have  two  modules, 
namely,  for  the  solution  of  problems  with  one  and  two  unknowns,  respectively.  In  the 
case  of  one  unknown  function,  we  seek  a  function  u(x)  defined  on  v  such  that 

(i)  u  is  a  stationary  point  of  the  functional 


(2-D 


(2.2) 


(2.3) 


are  invariants  with  respect  to  the  rotation  of  coordinate  axes. 

(ii)  If  r  ,  r1 ,  and  are  appropriately  chosen  subsets  of 
then  u  satisfies  boundary  conditions  of  the  form 


(2.4) 


u  =  A i h  ^ ( x )  +  X2h2(x)  on  rQ 
or 

§£  =  A-jk-j  (x)  +  X2k2(x)  on  r1 
or 

u  +  =  Ai (hi+ki )  +  X2(h2+k2)  on  r2 


(2.5) 


(2.6) 


where  n  is  the  outward  pointing  unit  normal  vector  on  lift. 


Similarly,  in  the  case  of  two  unknown  functions,  we  seek  a  vector  function 
u  =  (u,,u2)T  defined  on  ft,  such  that  again  conditions  of  the  form  (i)  and  (ii) 
hold.  In  (i)  the  functional  (2.1)  is  replaced  by 
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This  class  of  problems  is  fairly  general  and  includes,  in  particular,  most  of 
the  basic  problems  in  elasticity  theory.  Two  parameters,  Aj  and  A,,,  are  incor¬ 
porated  into  the  formulation,  hence  the  equilibrium  surface  of  the  problem  under 
study  is  two-dimensional. 

As  in  FEARS,  the  domain  5  is  defined  as  a  union  of  subdomains  9. . ,  upon  each 
of  which  is  introduced  a  finite-element  mesh.  In  order  to  minimize  the  amount  of 
redundant  data,  we  work,  as  far  as  possible,  with  non-intersecting  open  subdomains. 
The  set  of  discrete  points  necessary  to  close  the  union  of  these  open  subdomains  is 
treated  as  a  separate  case.  Figure  1  illustrates  how  this  process  works  for  a 
simple  (straight! ined)  domain.  Here  the  domain  Q  is  decomposed  into  a  union  of 

(1)  Three  2-dimensional  open  subdomains  (or  2-subdomains) ,  here 
indexed  as  2-1  through  2-3, 

(2)  Ten  1 -dimensional  open  line  segments  (or  1 -subdomains)  on  the 
boundaries  of  the  2-subdomains,  here  indexed  as  1-1  through  1-10, 
so  that  each  1-subdomain  is  on  the  boundary  of  at  most  2 
2-subdomains,  and, 

(3)  A  set  of  eight  discrete  points,  or  O-subdomains,  here  indexed  as 
0-1  through  0-8,  which  close  the  decomposition.  Each  0-subdomain 
is  thus  on  the  boundary  if  at  most  4  2-subdomains  and  at  most  4 
1- subdomains. 


For  the  design  of  a  reasonably  efficient  mesh  refinement  algorith,  further 
restrictions  on  the  choice  of  the  subdomains  are  desirable.  We  assume  that  the 
closure  q7  of  2-subdomain  is  a  diffeomorphic  image  of  the  closed  unit  square, 
Q~,  on  which  a  simple  hierarchy  of  subdivisions  can  be  defined.  Moreover,  the 
closure  of  each  1-subdomain  is  assumed  to  be  a  diffeomorphic  image  of  the  unit 
interval  [0,1]. 
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The  mesh  on  each  ft,,  consists  of  curvilinear  elements  which  are  first  defined 

on  Q  and  then  mapped  onto  ft..  Then  an  admissible  mesh  on  Q  is  defined  as  a 

01  o 

collection  ill  of  closed  squares  Q  in  Qq  which  are  generated  by  recursive  appli¬ 
cation  of  the  two  rules: 

(1)  The  mesh  M  =  M  consisting  only  of  Q  itself  is  admissible. 

o  o  (2.9) 

(2)  If  M  is  an  admissible  mesh  on  Q^,  then  the  mesh  M'  that  is 
obtained  from  M  by  subdividing  any  one  closed  square  Q  of 

M  into  four  congruent  squares  of  half  the  side  length  of  Q 
is  admissible. 

A  typical  mesh  on  generated  in  this  way  is  shown  in  Figure  2.  (For 
clarity,  some  of  the  mid-side  and  center  nodes  of  the  biquadratic  Hermitian  element 
have  been  omitted.)  The  refinement  introduces  two  types  of  intersection  point.  We 
call  a  point  nagiLlcv i  if  it  is  a  corner  of  all  undivided  squares  incident  with  it; 
all  others  are  termed  iAAzguZaA.  All  points  on  the  boundary  of  are  always 

regular.  In  Figure  2,  the  irregular  points  are  marked  by  small  circles.  In  order 
to  obtain  continuity  of  the  solution  at  irregular  points,  we  constrain  the  solution 
to  take  on  values  interpolated  from  nearby  regular  points.  Thus  the  irregular 
point  no  longer  carry  independent  information  of  their  own,  and  must  be  treated 
separately.  In  Figure  2,  the  elemental  stiffness  matrix  K  if  the  hatched  element 
depends  on  the  5  numbered  regular  points.  If  Kq  represents  the  usual  stiffness 
matrix  of  that  element,  then  K  :=  L^KQL,  where  L  is  an  interpolation  matrix 
whose  entries  depend  only  on  the  mesh  M  on  Q^. 

The  decomposition  of  ft  makes  evident  a  natural  parallelism  in  the  incre¬ 
mental  construction  and  solution  of  the  macro-stiffness  system.  The  overall  macro¬ 
stiffness  matrix  has  the  form 


I 
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where  A^  corresponds  to  the  interior  of  the  2-subdomain  sk  ,  Ci  corresponds  to 
the  0-  and  1-subdomains  on  the  boundary  of  Gft  ,  and  B  contains  information  on 
all, but  only,  the  0-  and  1 -subdomains .  Using  a  symmetric-decomposition  solution 
method,  we  may  perform  the  following  steps  for  each  ft.  in  parallel 


(i)  Generate  the  matrices  Ai  and  C.. 

(ii)  Generate  the  contributions  to  the  matrix  B  and  send  them  to 
a  designated  linear  sol ver-process. 

(2.11) 

(iii)  Compute  the  decomposition  of  A^  and  the  corresponding  modifi¬ 
cation  of  Ci . 

(iv)  Compute  the  resulting  modification  of  B  and  send  it  to  the 
solver-process. 


A  similar  procedure  applies  to  the  processing  of  the  right-hand  side.  (Less 
optimistically,  the  ft.  may  be  processed  serially,  but  even  here,  the  inherent 
parallelism  implies  that  the  order  of  processing  is  irrelevant.)  Upon  completion 
of  the  parallel  construction/decomposition,  the  matrix  B  is  decomposed  and  the 
corresponding  portion  of  the  solution  obtained.  The  remainder  of  the  solution,  in 
the  interiors  of  the  ft.. ,  is  obtained  by  back-substitution  for  each  ft. ,  and  is 
once  again  a  parallel  process. 

The  ability  of  this  scheme  to  deal  with  multiple  right-hand  sides  without 
altering  the  matrix  decomposition  is  of  particular  importance  in  the  continuation 
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process,  since  it  allows  us  to  use  modified  rather  than  full  Newton  refinement.  The 
macro-stiffness  matrix  is  calculated  only  once  in  each  continuation  step,  at  the 
starting  solution  point  for  the  step.  The  refinement  of  the  predicted  solution  then 
uses  this  same  matrix,  resulting  in  a  marked  improvement  in  efficiency.  Full  Newton 
refinement  remains  possible,  of  course,  if  it  is  needed. 

3.  NFEARS  CONTROL  STRUCTURE 

To  discuss  the  control  and  data  structures  of  the  program  NFEARS,  it  is  first 
necessary  to  discuss  the  philosophy  of  the  error  analysis  in  the  continuation 
process.  The  mathematical  basis  of  the  error  analysis  is  discussed  in  [4],  and  the 
various  aspects  of  its  implementation  are,  at  root,  the  raison  d'etre  for  NFEARS  as 
a  research  tool  in  the  first  place.  For  this  reason,  the  error  analysis  portions 
must  be  implemented  so  as  to  provide  maximum  flexibility  and  eash  of  experimentation. 
However,  error  analysis  in  NFEARS  does  not  occupy  a  readily  isolated  module,  but 
rather,  various  aspects  of  it  occur  throughout  the  code.  Those  aspects  of  NFEARS 
which  are  not  under  such  intense  research  scrutiny,  such  as  continuation  per  se, 
Newton  refinement  and  linear  solution,  are  still  of  sufficient  complexity  and  cost 
that  efficient  coding  is  essential.  Thus,  the  exigencies  of  error  analysis  determine 
not  only  that  portion  of  the  code,  but  dictate  design  criteria  for  all  of  NFEARS. 

To  the  extent  that  speed  and  ease  of  alteration  of  code  are  conflicting  design 
goals,  the  primary  design  criterion  for  any  portion  of  code  will  largely  rest  on 
whether  that  code  is,  or  is  not,  part  of  the  error  analysis. 

The  global  structure  of  the  NFEARS  design  is  such  that,  once  some  particular 
mesh  has  been  specified  and  a  solution  point  on  that  mesh  calculated,  several 
continuation  steps  may  occur  before  the  error  analysis  deems  mesh  alterations 
necessary.  Such  changes  are  dictated  by  the  error  indicators  calculated  for  each 
element  at  each  step,  and  will  involve  either  refinement  of  the  mesh  if  the  errors 
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subordinate  to  the  executive. 

The  mesh  management  module  is  responsible  for  the  creation  and  maintenance  of  a 
disk  data  structure  reflecting  the  current  mesh  and  current  solution.  As  noted, 
changes  to  the  mesh  are  anticipated  to  be  relatively  rare  in  relation  to 
continuation  steps  on  a  given  mesh.  However,  when  mesh  changes  do  occur,  their 
effect  on  the  total  NFEARS  data  structure  will  be  extensive.  The  data  structure 
used  to  store  mesh  information  is  a  straightforward  extension  of  the  tree  structure 
used  in  FEARS,  which  is  described  in  [3]).  Additionally,  whenever  the  mesh  is 
changed,  a  Transient  Data  Set  (see  below)  is  created  afresh,  containing  mesh- 
dependent  data  in  a  form  most  efficiently  available  to  the  continuation  module. 

Within  each  continuation  step,  error  indicators  are  calculated  on  each  element, 
as  are  summary  error  indicators  on  each  2-subdomain  and  for  the  region  as  a  whole. 
These  indicators  depend  on  the  new  solution  point,  and  are  calculated  as  soon  as  the 
new  solution  is  available.  Their  purpose  is  two-fold,  and  it  is  important  to 
delineate  the  two  functions.  On  the  one  hand,  it  must  be  ascertained  whether  the 
current  mesh  has  become  untenable,  and  must  be  (de)refined.  This  is,  in  essence,  a 
go-no-go  decision  to  be  made  after  each  continuation  step.  It  depends  not  simply 
on  individual  error  indicators  but  on  patterns  of  errors  within  each  2-subdomain. 

This  error  pattern  analysis  is  thus  a  submodule  of  the  FMC  process. 

On  the  other  hand,  once  the  current  mesh  has  been  deemed  unacceptable,  it  still 
remains  to  specify  (and  effect)  precisely  what  changes  must  be  made  to  the  mesh. 

This  logically  separate  step  occurs  in  the  (de) refinement  specification  module.  This 
module  is  yielded  control  only  after  an  FMC  episode  has  been  terminated  by  directive 
from  the  error  pattern  analysis.  In  turn,  the  mesh  management  module  will  then 
effect  the  specified  changes,  and  the  executive  will  calculate  a  new  starting 
solution  and  begin  another  FMC  episode. 

Apart  from  the  error  pattern  analysis  submodule,  the  FMC  module  poses  somewhat 
the  converse  design  criteria  than  the  error  analysis.  Continuation  on  a  fixed  mesh 
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has  already  been  researched  (see  PITCON  [5,6])  and,  subject  to  minor  and  anticipated 
variations  in  technique,  is  not  considered  to  be  of  major  research  interest  in 
NFEARS.  This  module,  then,  as  well  as  the  linear  solver  and  user-function  evaluation 
submodules  subordinate  to  it,  are  optimized  for  speed,  even  if  at  some  expense  in 
size,  clarity  and  flexibility.  Of  particular  importance  is  that  mesh-dependent 
data,  unchanging  during  several  continuation  steps,  is  calculated  only  once  and  is 
available  in  a  sequential,  unstructured  stream.  This  data,  the  Transient  Data  Set, 
is  described  in  the  Section  4  below. 

•Repeated  continuation  steps  within  an  FMC  episode  will  occur  until  one  of  the 
following  sequential  conditions  arises: 

(1)  The  target  values  of  the  parameters  and  are  achieved, 
with  no  contro-indications  in  terms  of  error  or  cost  control 
measures, 

(2)  Any  of  the  cost-control  parameters  are  violated,  such  as  the 
maximum  number  of  continuation  steps  or  the  minimum  acceptable 
step  size, 

(3)  The  error  pattern  analysis  indicates  that  the  current  mesh  has 
become  untenable,  requiring  refinement  in  some  portions  of  the 
region  and/or  derefinement  in  others. 

(4)  Unanticipated  serious  errors,  such  as  the  singularity  of  the 
Jacobian/continuation  matrix. 

At  the  conclusion  of  an  FMC  episode,  control  is  yielded  either  to  the  (de) refinement 
specification  and  mesh  refinement  modules  for  mesh  (de)refinement ,  or  the  the 
executive  module  to  terminate  the  run. 

As  in  PITCON,  a  single  continuation  step  consists  of  a  controlled  sequence  of 


six  sub-modules,  namely 
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(1)  The  calculation  of  a  tangent  vector  at  the  current  solution  point, 

(2)  The  selection  of  the  new  continuation  variable, 

(3)  The  selection  of  the  step  size, 

(4)  The  construction  of  the  predicted  solution  and  the  evaluation  of 
the  functional  (2.1)  at  this  point  via  user-supplied  functions, 

(5)  Modified  (optionally  full)  Newton  refinement  of  the  predicted 
solution,  including  the  calculation  of  the  elemental  and  summary 
error  indicators,  and 

(6)  The  error  pattern  analysis  on  the  error  indicators  to  make  the 
go-no-go  decision  on  mesh  (de) refinement. 

Notr>  that  the  failure  of  the  Newton  refinement  to  converge  will  not,  by  itself, 
terminate  on  FMC  episode;  rather,  it  would  simply  indicate  the  need  for  a  smaller 
step.  Cost  control  restrictions  would  then  terminate  this  process  if  needed. 

The  linear  solver  and  user-supplied  function  evaluation  modules  present  no 
unusual  design  criteria  other  than  as  noted  above. 

4 .  NFEARS  DATA  STRUCTURE 

The  data  structure  for  NFEARS  is  composed,  is  broad  terms,  of  two  distinct 
structures,  the  permanent  data  set  ( POS )  and  the  transient  data  set  (TDS).  This 
division  directly  reflects  the  corresponding  division  of  an  NFEARS  execution  run 
into  FMC  episodes  punctuated  by  mesh  re-specification  and  (de) refinement.  The 
distinction  between  permanent  and  transient  data  is  precisely  the  distinction  of  data 
that  transcends  any  particular  mesh  versus  data  that  is  dependent  on  a  given  mesh. 

The  importance  of  the  distinction  lies  in  the  fact  that  the  PDS  is  the  primary 
structure  for  the  mesh  specification  and  management  modules  (which  are  o  .imized 
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for  speed). 

^  The  PDS  contains  data  that  is  intended  to  survive  not  only  a  given  mesh 

selection,  but  even  the  termination  of  the  NFEARS  execution.  It  is  created 
immediately  upon  successfully  editing  user  input,  and  contains  all  data  regarding 
m  the  geometry  of  the  region,  as  well  as  user-supplied  run-control  and  cost-control 

parameters.  In  the  course  of  the  run,  it  is  augmented  with  the  current  mesh, 
solution  point,  and  error  indicators.  These  are  stored  in  a  flexible  tree 
structure  (similar  to  that  in  [3]),  enabling  mesh  (de) refinement  to  be  made  at  a 
reasonable  cost.  Also  stored  in  the  PDS  are  such  historical  data  as  may  be  needed 
for  nummary  reports  to  trace  the  events  of  an  NFEARS  run. 

Once  the  PDS  has  been  created  from  user  input,  and  some  mesh  specified  with  a 
corresponding  solution  point,  the  execution  of  NFEARS  may  be  interrupted  without 
loss  of  data,  and  resumed  "in  place"  at  some  later  date.  This  "dump/restart" 
gj  capability  may  be  used  for  interim  analysis  of  results,  post-processing,  or  simply 

to  alleviate  machine-time  restrictions.  It  also  provides  the  ability  for  the  user 
to  follow  different  curves  on  the  solution  surface,  starting  from  a  common  point. 

P  The  TDS,  conversely,  does  not  survive  any  instance  of  mesh  (de) refinement. 

It  is  created  in  its  entirety  by  the  mesh  management  module  whenever  the  mesh  is 
changed,  and  contains  all  data  from  the  PDS  necessary  to  do  continuation  on  the 
current  mesh.  It  is  used  by  the  FMC  process,  mostly  on  a  read-only  basis.  It 
consists  of  some  core-resident  portions  relating  to  the  0-  and  1-subdomains,  with 
the  remainder  placed  in  blocks  of  a  sequential  disk  file.  Each  block  corresponds 
to  one  2-subdomain,  and  the  blocks  are  processed  either  in  parallel,  or  serially, 
independent  of  order.  Input  buffering  overlap  on  the  TDS  is  tuned  to  keep  disk 
overhead  to  a  minimum. 

The  contents  of  the  TDS  for  a  given  mesh  includes  the  following  types  of  mesh- 
dependent  data : 


(1)  Integration-related  data  for  each  element  and  boundary  lino 
segment,  such  as  quadrature  coordinates  and  the  determinant  of 
the  Jacobian  if  the  isoparametric  domain  transformation, 

(2)  Index  information  needed  for  the  construction  of  the  linear 
system,  especially  in  the  assembly  of  the  Jacobian  matrix  of 
the  function  (2.1 ) , 

(3)  The  interpolation  matrix  L  by  which  the  solution  values  at 
irregular  nodes  are  expressed  in  terms  of  regular  nodes,  and 

(4)  Pivoting  strategies  for  the  incremental  decomposition  and  solution, 
by  2-subdomain,  of  the  linear  system. 

Some  data  in  the  TDS  is  organized  by  element,  some  by  node;  minor  repetition  of 
data  occurs  in  a  few  cases.  In  addition  to  the  TDS,  tabulations  of  shape  function 
evaluations  and  derivatives  for  the  biquadratic  Hermitian  element  are  stored  in  the 
PDS,  again  to  obviate  repetitive  calculation  within  the  FMC  process. 
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